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Abstract 


We propose a framework for indexing of grain and sub-grain structures in electron backscat- 
ter diffraction (EBSD) images of polycrystalline materials. The framework is based on a 
previously introduced physics-based forward model by [Callahan and De Gr^ (2013) re¬ 
lating measured patterns to grain orientations (Euler angle). The forward model is tuned 
to the microscope and the sample symmetry group. We discretize the domain of the for¬ 
ward model onto a dense grid of Euler angles and for each measured pattern we identify 
the most similar patterns in the dictionary. These patterns are used to identify boundaries, 
detect anomalies, and index crystal orientations. The statistical distribution of these clos¬ 
est matches is used in an unsupervised binary decision tree (DT) classiher to identify grain 
boundaries and anomalous regions. The DT classihes a pattern as an anomaly if it has an 
abnormally low similarity to any pattern in the dictionary. It classifies a pixel as being near 
a grain boundary if the highly ranked patterns in the dictionary differ signihcantly over the 
pixels 3x3 neighborhood. Indexing is accomplished by computing the mean orientation of 
the closest dictionary matches to each pattern. The mean orientation is estimated using a 
maximum likelihood approach that models the orientation distribution as a mixture of Von 
MisesFisher distributions over the quaternionic 3-sphere. The proposed dictionary matching 
approach permits segmentation, anomaly detection, and indexing to be performed in a uni- 
hed manner with the additional beneht of uncertainty quant ideation. We demonstrate the 
proposed dictionary-based approach on a Ni-base INIOO alloy. 


Part of this work was reported in the Proceedings of the IEEE International Conference on Image 
Processing (ICIP), Melbourne Australia, Sept 2013. 
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1 Introduction 


Electron backscatter diffraction, EBSD, is used to perform quantitative microstructure anal¬ 


ysis of polycrystalline materials on a millimeter to nanometer scale (Schwartz et ah (2009)). 


Most current EBSD segmentation (i.e., delineation of individual grains by determination of 
the grain boundary locations) and indexing (i.e., orientation determination) methods ex¬ 
tract orientations and widths of Kikuchi bands in a measured pattern by using a modihed 
Hough transform, implemented using image processing tools such as butterfly convolution, 
Gaussian hltering, binning, peak detection, and image quality maps to gauge indexing and 


segmentation accuracy (Tao and Eades (2005)), (Wright and NoweII| (2006)). By compar¬ 


ing the measured diffraction line parameters to a pre-computed database, indexing yields 
the crystal orientation, commonly described by three Euler angles with respect to a refer¬ 
ence frame, for the volume illuminated by the beam. By repeating the process on a grid 
of scanning locations on the sample, an orientation map or image is produced. The image 
is then segmented into grains by thresholding normed differences between the Euler angles 
(misorientations). The accuracy of the Hough approach to EBSD indexing depends to a 
large extent on the visibility of the Kikuchi bands, which is often represented in terms of an 
’’image quality” parameter. 
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In this paper we propose an alternative indexing approach that uses a physics-based 
forward model for the full diffraction patterns and does not require application of the 
Hough transform or other image processing tools. The proposed approach exploits the 
known physics of electron scattering phenomena that underlies the image formation process. 
With this additional information, grain boundaries and anomalous points can be detected as 
explicit classes at the same time as grains are segmented. Anomaly detection (i.e., the auto¬ 
mated detection of abnormal or unexpected diffraction patterns) is an important capability, 
since anomalies may correspond to defects or contaminants that affect the material prop¬ 
erties. In addition, unlike methods based on the Hough transform, the proposed pattern 
dictionary approach differentiates grain interiors from grain boundaries without requiring 
additional processing of the measured patterns. 

Automated indexing of electron diffraction patterns by means of pattern matching tech¬ 


niques is not new, and was proposed in 1991 by Wright and coworkers (Wright et ah (1991)) 
for backscattered Kikuchi diffraction patterns, now commonly known as electron backscatter 
diffraction patterns or EBSPs. Their approach involved automated comparisons between a 
series of experimental patterns and idealized patterns, created from a set of orientations 
that uniformly covers the asymmetric region (or fundamental zone) of Euler space. While 
the technique showed promising results, computer limitations prevented the approach from 
gaining widespread acceptance. In the context of precession electron diffraction in the trans¬ 


mission electron microscope, Rauch and coworkers (Rauch and Dupuy (2005); Rauch et ah 


(2008)) proposed a template-based pattern matching approach for the automatic orientation 
determination of quasi-kinematical diffraction patterns. The templates are computed from 
kinematical structure factors, and the proper asymmetric part of Euler space is sampled uni¬ 
formly to generate a template collection, which is then compared quantitatively against the 
experimental patterns. In this template matching process, only the top match is considered 
in the determination of the crystal orientation. As we will explain in detail in the present 
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paper, our dictionary appproach uses, in addition to a physics-based model for the gener¬ 
ation of the dictionary, a statistical model for the orientation distribution of all the highly 
ranked pattern matches to provide both a stable and robust estimate of orientation, as well 


as a quantitative statistical uncertainty; the method proposed in Rauch and Dupuy (2005) 
does not perform such a statistical analysis. The current commercially available EBSD 
indexing suites also lack a statistical determination of the uncertainties in the orientation 
determination. 

The proposed indexing framework relies on two components: offline dictionary generation 
and online dictionary matching to the sample. Offline dictionary generation is accomplished 
as follows. First, a dictionary of raw diffraction patterns is generated using the forward 


model of Callahan and De Graef (2013). This dictionary is tuned to the parameters of 
the microscope and the crystal symmetry group(s) of the sample. Second, the singular 
value decomposition (SVD) of the dictionary is computed and a second dictionary, called 
the dictionary of background compensated patterns, is generated by projection of the raw 
dictionary onto the space orthogonal to the hrst principal component; this is essentially a 
background subtraction process. 

The hrst step in the online dictionary matching algorithm is to compute normalized inner 
products between the uncompensated (raw) sample patterns and the uncompensated dictio¬ 
nary. This is repeated on the compensated sample patterns and the compensated dictionary 
patterns. The basis for the online dictionary matching algorithm is the construction of a pair 
of bipartite graphs, uncompensated and compensated, respectively, using these normalized 
inner products. A bipartite graph is a graph connecting vertices or nodes in two disjoint 


sets (Diestel (2005)); in our case, the hrst set contains the experimental patterns at diher- 
ent locations on the sample, the second the dictionary patterns for diherent orientations. 
For each spatial location on the sample, the top k (normalized) inner prodncts between the 
sample dihraction pattern and a dictionary determine the fc-nearest-neighbor (kNN) neigh- 
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borhood of the pattern in the dictionary. These kNN neighborhoods form the backbone of 
the proposed online dictionary matching algorithm. 

From the bipartite graphs, sample patterns can be classihed as grain interiors, grain 
boundaries, or anomalies, using an unsupervised decision tree (DT) classiher dehned on the 
graphs. Specihcally, the classiher uses the shapes of these kNN neighborhoods to discrimi¬ 
nate between these types of patterns. A tightly clustered and connected kNN neighborhood 
indicates a grain interior sample pattern. A kNN neighborhood that forms two or three clus¬ 
ters in the dictionary indicates a grain boundary sample pattern. kNN neighborhoods that 
are very spread-out and scattered indicate anomalous sample patterns. The unsupervised 
DT classiher uses the uncompensated dictionary to distinguish unusual (anomalous) back¬ 
ground patterns. The compensated dictionary is used to distinguish non-anomalous patterns 
as interior to a grain or on the boundary of a grain. A pixel is classihed as ’’anomalous”, 
’’grain interior” or ’’grain boundary” using an unsupervised decision tree (DT) classiher to 
test homogeneity of the pattern matches over a 3 x 3 spatial patch centered at the pixel. 
The ehect of surface roughness, and the resulting shifts in the EBSD background intensity, 
is a topic of ongoing analysis. 

Indexing of crystal orientation is performed using a maximum likelihood (ML) estimation 
strategy for determining the Euler angles. The ML strategy hts a Von Mises-Fisher mixture 
(VMFm) density model to the observed distribution of the Euler angles of the top dictionary 
matches; the von Mises-Fisher distribution is a probability distribution on a sphere in a p- 
dimensional space. An iterative expectation-maximization (EM) estimation algorithm is used 
to perform the ht. The use of the VMFm model allows one to account for the symmetry- 
induced ambiguities of the crystal orientation and produces an estimate of the Euler angles 
in the desired fundamental zone. A side beneht is that the algorithm yields an estimate of 
the spread of the VMEm model that can be used as an a posteriori conhdence measure on 
the orientation estimate. 
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To the best of our knowledge, the framework proposed here is the first EBSD indexing 
approach that uses a dictionary generated by a physics-based forward model. Some advan¬ 
tages of our model-based approach are: 1) it incorporates the physics of dynamical electron 
scattering; 2) it unifies segmentation, indexing and anomaly detection; 3) it incorporates a 
statistical model that naturally generates both an estimate of orientation and a measure of 
conhdence in the estimate; 4) it involves parallelizable operations relying on simple inner 
products and nearest neighbor search. At the same time, the large size of the dictionary, 
together with the high dimension of the diffraction patterns, create computational challenges 
as discussed in Sec. [H 

The outline of the chapter is as follows. Section [^presents the proposed dictionary model 
for EBSD pattern classihcation and indexing. Section develops the statistical algorithms 
for anomaly detection, segmentation, and indexing based on the dictionary model. Section]^ 
describes the dictionary generation process. Section presents the experimental methods 
for generating the EBSD samples used in Sec. Section presents the results of applying 
the proposed dictionary-based classihcation and indexing to a Ni-base INIOO alloy. Finally, 
Section summarizes the paper and points to future directions. 


2 Dictionary Model 


This section describes the non-linear forward model of Callahan and De Graef (2013). It then 


shows how a physics-based dictionary of diffraction patterns can be used to approximately 
linearize the forward model. For descriptive economy, throughout this paper we denote 
by a ’’pixel” a particular scan location on the sample surface; with each pixel, there is an 
associated EBSD pattern acquired at the sample location. 
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2.1 Forward model 


When the beam is focused on a grain within the sample, the measured backscatter diffraction 
pattern, Y, on the detector surface can be expressed as a function of the crystal orientation 
6, parameterized by an Euler angle triplet ((pi, $, (^ 2 ), the incident electron energy, E, and 
interaction depth, zq, as: 

Y = H{V{6,E,zo)) + N, (1) 


where ii{V{6, E, zq)) is a forward model for the backscatter process and N is detector 
noise. Here H is a measurement operator that accounts for the instrument geometry and 
sensitivity, and V represents the thickness and energy averaged mean backscatter yield. 
An accurate forward model for this yield was proposed by Callahan and De Graef (2013). 
The model employs Monte Carlo simulations to obtain the energy, spatial, and exit depth 
distributions of the backscattered electrons. This statistical information is then used to 
compute a series of dynamical ”EBSD master patterns” as a function of the electron exit 
energies and directions. The master pattern represents all possible EBSD patterns for a given 
exit energy, and specification of the grain orientation 6 along with the detector geometry 
then leads to an actual EBSD pattern for that orientation. These three steps constitute the 
forward model V. The measurement operator H includes the scintillator-to-CCD conversion 
process in the form of a point spread function for the coupling optics, as well as detector 
quantum efficiency and CCD binning mode. In the remainder of this paper, we will refer to 
the depth and energy averaged diffraction pattern generated by the forward model described 
above as the mean diffraction pattern. 

As the electron beam is scanned across the sample surface, the diffraction pattern will 
change due to changes in the local crystal orientation 6 between homogeneous grains. In 
grain interiors, the forward model ([^ can be used to produce estimates of crystal orientation 
at each scan location. Elsewhere in the sample, e.g., near boundary regions or near locations 


Callahan and De Graef (2013 





of anomalous features, the model ([^ will no longer be a good fit to the measured patterns. 
Thus, the goodness of fit of the forward model can be used to classify grains, grain boundaries, 
and anomalies in the sample. This forms the basis for our proposed use of the forward-model 
to perform classification and indexing. 

2.2 Sparse dictionary-based forward model 

In principle one could formulate classification and indexing as a non-linear inverse problem 
using the full forward model Q. For example, given a noise model for N one could perform 
maximum likelihood estimation to solve the indexing problem and likelihood-ratio testing to 
solve the classification problem. In the special case of a Gaussian noise model both solutions 
would require solving the non-linear least-squares problem min^ ||Y — H(P(0, 2;o))||^, in 

which the Euclidean norm squared of the residual htting errors is to be minimized with 
respect to the orientation 6. Here, we take a simpler approach to the inverse problem that 
leads directly to tractable indexing and classihcation algorithms. 

Let V denote a precomputed dictionary of mean diffraction patterns obtained by densely 
sampling the function li(V{9, E, zo)) over the range of orientations 9, keeping E and zq 
hxed. Assume that the size of V is d. Then, for sufficiently dense samples, the model ([^ 
can be approximated by 

d 

Y = ^Xi0, + N, (2) 

i=l 

where cj)i E V are dictionary elements and Xi are coefficients. When Y corresponds to the 
measured pattern at a location within a grain, one might expect that only a few Xj’s will 
be non-zero, i.e., the representation ([^ is sparse. In particular, as the dictionary becomes 
increasingly dense the sparsity of the representation will also increase and, in the limit, x* 

will become a delta function Xi = A{i — j a where j is the index of the true orientation 
‘^^The Delta function A{i) is equal to 1 if z = 0 and is equal to 0 for other i. 
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at that location. Note that when N = 0, in this limiting case (of a fully dense dictionary) 
errorless estimation of 0 can be accomplished by hnding the index i which yields the largest 
normalized inner product p(Y, (j)i) = where, for two diffraction pattern A = {{Aim)) 

and B = {{Bim)), {A, B) = Yhim^irnBim) where I and m index the vertical and horizontal 
locations on the photodetector. The signihcance of this fact is that we have simplihed the 
solution of a complicated non-linear least squares problem to the solution of a linear least 
squares problem followed by a table lookup (matching an index of the dictionary to the 
associated Euler angle). 

In the practical case of a hnite dictionary there will not be an exact match to the true 
diffraction pattern and ([^ is interpreted as a model that interpolates over the patterns in 
the dictionary, with interpolation coefficients For the purposes of indexing and 

classihcation we will restrict ourselves to sparse models, where only a few {k) of these coef- 
hcients are non-zero, i.e., using only a small number of dictionary elements to £t (|^. This 
sparse approximation problem is a well studied mathematical problem with many different 
iterative algorithms available for identifying the few non-zero coefficients. The brute force 
algorithm that tries to find the best £t over any set of k dictionary elements is intractable 
except for very small values of k. Alternatives include basis pursuit methods such as or¬ 
thogonal matching pursuit (OMP), stepwise OMP (stOMP), compressive sampling OMP, 
iterative soft thresholding (1ST), and convex optimization relaxation methods such as li 


minimization using active set, interior point, or subgradient methods (Tropp and Wright 


(2010)). In the EBSD application, the sparse approximation has to be performed for each 
and every pattern measured on the sample. Even with a relatively modest dictionary size 
and low sample resolution, e.g., d = 100, 000 elements and n = 100, 000 scan locations, these 
methods are computationally heavy. 

We have adopted a simpler correlation matching approach with significantly reduced 
computation requirements. Instead of htting the model ([^ through least squares we simply 


10 





use inner products to find the k top matches between the dictionary and the observations. 
Specihcally, for each measured pattern Y we compute the normalized inner products (cor¬ 
relation) p(Y, 0i),p(Y, between Y and the dictionary and rank them in order of 
decreasing magnitude. The top correlation matches are the k dictionary patterns having 
the highest inner product^ These k patterns constitute and pixel’s fc-nearest-neighbor 
(fc-NN) neighborhood in the dictionary. The collections of fc-NN’s dehne a bipartite graph 
connecting measured patterns to patterns in the dictionary. In general, these connections 
will be one-to-many, i.e., for each experimental pattern there will be a small number of near 
matches in the dictionary. The /c-NN neighborhoods will be used to perform classihcation 
and indexing as described in Sec. 

In addition to the dictionary P, referred to as the uncompensated dictionary, we will use 
a derived dictionary of compensated patterns Vc to cluster the observed patterns into classes 
that can then be used for anomaly detection, segmentation and indexing. The compensated 
dictionary Vc consists of patterns in V after projecting away the background. This is per¬ 
formed by applying the singular value decomposition to determine the principal component, 
which closely resembles the population mean of the patterns in and projecting the dictio¬ 
nary onto the space orthogonal to the principal component (i.e., removing the mean pattern 
from each individual pattern). This compensation process, which simply removes the back¬ 
ground common to all patterns, improves the dictionarys ability to discriminate between 
diffraction patterns. However, the uncompensated dictionary will also be used since it can 

better discriminate anomalies that are primarily manifested in the background. 

‘^^Note that these top k inner product matches will be identical to the dictionary elements selected by k 
iterations of OMP in the case that the dictionary is orthogonal. 
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3 Classification and Indexing 


The proposed correlation matching approach to classihcation procedes in two steps. First 
inner products between the observed patterns and patterns in the dictionaries T> and Vc are 
computed. For each observed pattern we only store its k closest dictionary matches, i.e., those 
dictionary patterns with the k highest inner products with respect to the observed pattern. 
Then a pixel is classihed as a grain interior, a grain boundary, or an anomaly using a decision 
tree classiher applied to the set of top pattern matches in the dictionary. The proposed 
correlation matching approach to indexing pixel orientation computes an estimate of the 
mean orientation over the top matching patterns in the dictionary. In order to account for 
noise and the non-euclidean nature of the Euler sphere, the mean angles are estimated using 
a specially adapted maximum likelihood estimator, introduced in the indexing subsection 
below. Pixel classihcation and indexing are performed independently and are discussed 
separately. 


3.1 Classification 


Classihcation of a pixel is performed by evaluating the inner products between the pixels 
uncompensated and compensated patterns and patterns stored in the dictionaries V and 
Vc, respectively. Anomalies are detected as abnormally low average inner products between 
an uncompensated pixel pattern and patterns in V. Specihcally the average inner product 
similarity measure is dehned as 




( 3 ) 


where is the average of the normalized inner products between pattern Yj at pixel i and the 
d pattern in the dictionary V. 

Boundaries between grains are detected based on the lack of homogeneity of the matches 
to the dictionary of a pixel and its 8 adjacent pixels, which we call a 3 x 3 spatial 
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patch. Specifically, for pixel i we define the neighborhood similarity measure 

average amount of overlap between the fc-NN neighborhood in Vc of the pixel and the fc-NN 

neighborhoods, with k = 40, in Vc of the adjacent pixels on the patch: 


PMAi) = 


Sk 


card{AfkNN{j) P^AfkNN{i)}, (4) 

jeXsxsl*) 

where ^^e indices of the 8 neighbors of pixel i. The neighborhood similarity 

PmA^) will have value close to 1 when the image patch is located in a grain. Its value will 
be close to zero when the image patch is centered on an anomaly. Its value will be between 
zero and one when the image patch is at a grain boundary (See Fig. [^. 

The inner product similarities (for anomalies) and the neighborhood similarities (for grain 
boundaries) are used by a pattern classifier to assign each pixel to one of four classes. While 
many different types of unsupervised classifiers could be used, e.g., /c-means, linear discrim¬ 


inant analysis (Hastie et ah (2005)) or deep learning networks (Hinton et ah (2006)), here 
we propose to cluster patterns using an unsupervised Decision Tree (DT) classifier (|Hastie 


et ah (|2005)) whose classification boundaries are determined so that they separate the modes 


(regions of concentration) of the histograms of inner-product similarity and neighborhood 
similarity over the sample. The non-overlapping modes can easily be separated by thresh¬ 
olding of the similarity value while the others can be estimated using a mode decomposition 
method such as Gaussian mixture modeling, also called mixture of Gaussian (MoG) modeling 
(Figueiredo and Jain ( |2002 )). This is illustrated in Fig. The unsupervised DT classifier is 
illustrated in Fig. [^in Sec. [^for the INIOO sample considered. Four clusters were discovered 
by the model: anomalous pixels, which divided into two subclusters of shifted background 
and noisy background, and normal pixels, divided into grain boundary and grain interior. 

Decision tree classifiers have been previously applied to many imaging applications, e.g.. 


land cover classification in remote sensing (Friedl and Brodley (1997)),(Pal and Mather 


(2003)). However, there are significant differences between the proposed DT classifier and 
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Modeling with a Gaussian mixture, optimal boundary-32.537 



Figure 1: A two component Gaussian mixture model has a good fit to the neighborhood 
similarity histogram in right panel of Fig. The point where the two Gaussian compo¬ 
nents cross (dotted vertical line) determines the threshold for the right lower branch of the 
unsupervised decision tree classiher in Fig. 

those previously applied. First, the proposed classiher is a hybrid DT that uses special 
features (background compensated and uncompensated patterns) and similarity measures 
(inner products and neighborhood intersections) specihc to materials microanalysis. Second, 
unlike standard non-parametric DT classihers, the proposed DT is informed by a physics 
model through the generated dictionary of diffraction patterns. Third, our use of unsuper¬ 
vised Gaussian mixture models to determine the classihcation thresholds means that the DT 
classiher threshold parameters are determined by the Gaussian mixture models and do not 
need to be tuned, thus eliminating the need for labeled training data and time consuming 
cross-validat ion. 
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3.2 Indexing 


The proposed pixel indexing method is formulated as a statistical estimation problem. The 
pixel’s crystal orientation is estimated via maximum likelihood under a Von Mises-Fisher 
mixture density model for the EUler angles of the k top dictionary matches. Note that 
the k used in the maximum likelihood model may be different from the k used to compute 
the neighborhood similarity for the DT classiher. We motivate the Von Mises-Fisher model 
as follows. Recall that the dictionary is generated for a set of predetermined orientations 
0. Hence using simple table lookup, the indices of dictionary patterns found in the fc-NN 
neighborhood of a pixel can be mapped to a set of k orientations If the pixel is in a 

grain then these orientation will be clustered around a true crystal orientation, that we call 
0, at the pixel location. We extract a maximum likelihood estimate of this orientation using 
a statistical model for the variation of 0j’s. 

We assume that the k best matching orientations of a pixel form a random sample 

from an underlying marginal density f{]0), supported on the orientation sphere. Then the 
maximum likelihood estimator of 6 is 


6 = argmaxg n 


( 5 ) 


i=i 


As is customary in the theory of directional statistics (Mardia and Jupp ( |1999 )) we 
parameterize the density of 3-dimensional orientations by their equivalent 4-dimensional 
unit length quaternions (llQilh = !)• Due to crystal symmetry, there are many 

(M) orientations that are equivalent to each other, i.e., the Euler angle representation is not 
unique. For any quaternion q, the set of symmetry equivalent quaternions can be represented 
as where is a 4 x 4 quaternion (rotation) matrix and Pq is the identity 

matrix. The matrices are symmetric and constitute an abelian group of actions 

on the 3-dimensional sphere. For example. Nickel has face-centered cubic symmetry, hence 
there are M = 24 symmetry-equivalent orientations. Since the quaternion representation of 
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orientations associates two quaternions (q and — q) to each orientation, we need 48 matrices 
{Pmr^=o to establish a 4D representation of the m3m point group (|De Graef and McHenry] 


(2007)). 


The proposed model for orientations is based on a generalization of the Von Mises-Fisher 
density (Mardia and Jupp ( 1999[ )) to group structured domains on the sphere. The standard 
Von Mises-Fisher density over the 3-dimensional sphere with location parameter and 
precision parameter k is dehned as 


fvMFi^; At, k) = C4(k) exp (kaa^x), ||x ||2 


= 1 


( 6 ) 


where cJk) = —rx and is the modihed Bessel function of the hrst kind. Here 

||/^||2 = 1 and K > 0 control the location of the mode (maximum) and spread of the density 
over the sphere, respectively. The natural generalization to the group structured domain is 
the periodic mixture density 

2M-1 


/(x;^,k) = /nMF(x;QmA^, 


2M 


K) 


( 7 ) 


m=0 


This Von Mises-Fisher mixture (VMFm) density contains 2M replicates of the Von Mises- 
Fisher density over the sphere centered at all the symmetry-equivalent values of the location 
parameters 

Substitution of ([^ into (|^ and use of the well-known invariance property of maximum 
likelihood estimation ( [Lehmann and Casella (1998)) gives a form for the maximum likelihood 
estimator of grain orientation 0 in terms of the joint maximum likelihood estimators fi and 


K\ 


k 2M-1 

{A, k} = argmax^ ^ HE lmfvMF{^.j', QmA*, 

j=l m=0 


K . 


( 8 ) 


where = (2M) Even though this maximization problem appears daunting, it can be 
iteratively and efficiently computed by applying the well known expectation-maximization 


(EM) procedure for constrained parameter estimation in mixture models (McLachlan and 
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Peel (2004|)). For a full account of this procedure we refer the interested reader to Chen et ah 


(2015). Note that Ijk gives an empirical estimate of the degree of spread of the density about 


the orientation estimate \i Thus, k is a. measure of conhdence of this estimate. 

4 Generation of the Dictionary 

The dictionary approach requires a uniform sampling of orientation space SO{3). Sev¬ 
eral sampling schemes are available in the literature; among the most popular schemes 


are a deterministic sampling method based on the Hopf hbration (Yershova and LaValle 


(2004),Yershova et ah (2009)) and the HEALPix framework (Hierarchical Equal Area iso- 


Latitude Pixelization) (Gorski et ah (2005)). Neither of these approaches is easily adaptable 


for integration with crystallographic symmetry. Instead, we employ a recently developed 
strategy that starts from a simple 3D cubic grid which is mapped uniformly onto SO{3) 


(Roca et ah (2014)). This cubochoric mapping is uniform, rehnable, and isolatitudinal, and 


consists of three steps: 


1. a uniform cubic grid of {2N + 1)^ grid points is generated inside a cube of edge length 
a = 7r2/3; 


2. the cube is divided into six pyramids with apex at the cube center and the cube 
faces as base, and each pyramid is mapped uniformly onto a sextant of a ball, using 


a generalization of the mapping of a square onto a curved square (Roca and Plonka 


( 2011 )); 


3. all points inside the ball are then transformed, using a generalized inverse equal-area 
Lambert mapping, to the unit quaternion Northern hemi-sphere, which is isomorphous 
with SO{3). 
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From the quaternion representation one can readily derive other orientation parame- 
terizations; the Rodrigues parameterization is most suitable for the determination of the 
orientations that belong to the fundamental zone (FZ) for a given crystal symmetry, because 
the boundaries of the FZ are planar. The more conventional Euler angle representation typ¬ 
ically has curved surfaces as the boundaries of the FZ, so that Euler angles are less useful 
for uniform sampling approaches. It should be noted that lower crystal symmetry implies a 
larger dictionary, since the fundamental zone size increases with a reduction in symmetry; 
acceleration of the dot product calculations by means of a GPU (graphical processing unit) 
is a topic of ongoing research. The use of the Rodrigues representation to determine the 
dictionary elements has computational advantages, but care must be taken in the case of 
crystal symmetries with a single diad axis, for which the Rodrigues fundamental zone be¬ 
comes inhnite in the direction normal to the axis. Our approach still produces a uniform 
sampling of orientation space, although all rotations by an angle of 180° are represented by 
points at inhnity (which correspond to points on the outer cube surface in the cubochoric 
representation). 

The dictionary used for the remainder of this paper was generated by setting N = 100, 
and keeping only those orientations for which the corresponding Rodrigues vector lies inside 
the fundamental zone for the octahedral m3m point group. The patterns in the dictionary 
were down sampled to 60 by 80 pixel images. This results in a dictionary V with d = 333226 
elements of dimension 4800. A representative (random) selection of 9 dictionary elements 
in V and Vc is shown in Fig. The left panel of Fig. shows how the sampling points 
are distributed inside the octahedral Rodrigues FZ. The right panel of the hgure shows the 
rate of drop-off of the top 200 inner products in the compensated dictionary for 4000 
randomly selected reference elements. This decay rate is used to select the number of nearest 
neighbors {k = 40) used for the classiher described in Sec. and implemented in Sec. 
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5 Computational Considerations 


The online dictionary matching algorithm requires inner product evaluation between the 
measured sample diffraction patterns and the dictionary diffraction patterns. Let d denote 
the number of patterns in the dictionary (dictionary size), L denote the number of pixels on 
the photodetector (pattern size), and n denote the number of pixels on the sample (sample 
size). The time complexity of calculating the mean inner product over the entire measured 
sample is 0{L{d + n)). 

For the indexing method, to determine the /c-NN dictionary patterns for a given pixel the 
k largest inner products need to be determined from the set of all inner products between the 
pixel and patterns in the dictionary. The time complexity of the whole process is 0{Ldn), 
assuming k d. The computation time and space grow rapidly when the dictionary size 
and sample size become large. 

Computational challenges can be addressed in several ways. The simplest approach is to 
use parallelization and distributed computation. All of the algorithms introduced here can 
be parallelized over the spatial domain since they involve local operations. For example, ML 
orientation estimation is applied independently to each pixel and DT classihcation is applied 
to each spatial patch in the sample. To speed up the k-NN dictionary search one can use 
methods from information retrieval such as dictionary caching and KD trees to accelerate 
the inner product evaluation and ranking process. These methods rely on the similarity of 
diffraction patterns over the /c-NN neighborhood. However, as they rely on approximation, 
these methods may also introduce indexing errors. A study of these and other computational 
trade-offs is important but is outside the scope of this paper. 
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6 Experimental Methods 


To test the dictionary approach against an experimental data set, a polycrystalline INIOO 
Nickel-based snper-alloy sample was selected. The sample was polished using a multi-platen 
Robomet.SD, using a grit of 1 micron diamond slurry on a TexMet cloth and finished with 
a 40 nm colloidal silica slurry on a ChemoMet cloth. Between polishing steps, a water clean 
was used on a ChemoMet cloth. 

A backscattered electron (BSE) image was recorded using a Tescan Vega 3 XMH scanning 
electron microscope outhtted with a LaB^ filament. An EBSD map was obtained with the 
same SEM and a Bruker e-FlashlOOO system. A tilt angle of 70°, voltage of 30kV, working 
distance of 15mm, and emission current of about InA were used to collect the EBSD map. 
The spatial resolution in both x and y directions was 297.7nm, and a Kikuchi pattern of 
80 X 60 pixels was acquired and stored at every point in a 512 x 382 map. 

7 Results 

A dictionary was designed as described in Sec. |^for the octahedral m3m crystal symmetry 
group to match the known characteristics of the INIOO sample and the SEM system, as 
described in Sec. Dictionary inner-products and /c-NN neighborhoods were computed 
from the detected pattern at each scan location. 

We indicate four different scan locations (pixels) with qualitatively different patterns in 
Fig. These locations are representative of the four different clusters of pattern, described 
below (see Fig. |^, that were discovered by the unsupervised DT classifier. In Fig. the 
histograms of the dictionary inner-products are shown for each of these pixels. The ’’shifted 
background” and ” noisy background” pixels have inner-products that are well separated from 
each other in addition to being separated from the inner-products of the ’’grain interior” and 
’’grain boundary” pixels. Thus one might expect that the group of ’’anomalous” pixels. 


20 


represented by the former two, could be easily separated from the group of ’’normal” pixels, 
represented by the latter two, using any reasonable clustering technique based on the inner- 
products. On the other hand, the inner-product histograms for the grain interior pixel and 
the grain boundary pixel are overlapping. This overlap makes it difficult to distinguish these 
two types of pixels and justihes the need for the more sensitive neighborhood similarity 
measure that is better able to separate them. 

Figure shows the full-sample histograms of inner-products p with respect to V and 
neighborhood similarities with respect to "Dc, respectively, over all pixels and over all 
patterns in the dictionaries. The left panel of Fig. can be interpreted as the addition 
of all other inner-product histogram to the four histograms shown in Fig. Similarly to 
Fig.0 the full-sample inner-product histogram exhibits three well separated modes, which 
conhrms that anomalous pixels can easily be separated from the normal pixels on the basis 
of thresholding each pixel’s average inner product measure. The three modes correspond to 
anomalous pixels with inner-products clustered around 0.7 and 0.97 (not visible in the range 
of p plotted) and normal pixels with inner-products clustered around 0.997. 

The right panel of Fig. shows the histogram of all neighborhood similarities computed 
with neighborhood size k = 40. The latter histogram is bimodal and asymmetric about its 
mean. The higher mode located near 37 corresponds to pixels whose /c-NN neighborhood 
in Vc has high overlap with the fc-NN neighborhoods of its 8 adjacent pixels. Such pixels 
are likely to be interior to a grain. The lower mode located near 26 corresponds to patches 
of pixels near grain boundaries, patches that have less similar fc-NN neighborhoods than in¬ 
grain pixels. To separate these tow modes, we htted a two component mixture-of-Gaussian 


model to the histogram in the middle panel using the MoG EM algorithm (McLachlan and 


Peel (2004)). The result of this £t is shown in Fig. The point of intersection of each of 


the htted Gaussian densities (shown in the Figure by vertical dotted line) is used as the DT 
classification threshold for discriminating between grain boundaries and grain interiors. 
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Figure shows the unsupervised DT classiher used to cluster observed patterns. The 
lower four nodes are leaf nodes while the upper three nodes are decision nodes for which 
thresholds are used to assign labels. These thresholds were determined from the observed 
histograms shown in Fig. |^as described above. The DT classihes each pixel on the sample 
based on the pattern matches in the dictionaries. The top node, labeled ’’observation” 
classihes pixel as a ’’anomaly” or as ’’normal” by thresholding the average inner-product p 
between the pattern at the center pixel of the patch and the patterns of the dictionary. Any 
threshold between p = 0.97 and p = 0.99 would separate the normal pair (grain interior, 
grain boundary) from the anomalous pair (noisy background, shifted background) and we 
selected the midpoint. The anomalous patterns are further subclassihed on the left branch 
of the DT by applying a threshold between 0.7 and 0.95 to p. The DT classihes normal 
pixels as either grain-interior or grain-boundary pixels by applying the threshold 32/40 to 
the neighborhood similarities ^^4- Representative patterns are shown at the bottom of Fig.|^ 
that have been identihed as belonging to each of the respective four clusters. 

Figure|^shows the pixel neighborhood similarities (left panel) and the pixel classihcations 
(right panel) as images as determined by the unsupervised DT classiher. In the classihcation 
image the blue/red regions and black regions respectively correspond to pixels classihed as 
anomalies and boundaries. These class labels can be used for segmentation of the grains and 
identihcation of the anomalies. A blowup of these images in Fig. |^is shown in Fig. [^for a 
small region. 

Next we illustrate the use of the dictionary for estimation of the Euler angles in the 
sample. For the same subregion as in Fig.[^ Fig. [^illustrates the OEM (original equipment 
manufacturer) orientation estimates (top left image), an estimate equal to the orientation 
of the element of the dictionary having largest normalized inner product with the pixel 
pattern (top right image), the proposed ML estimator pi of the orientation (VMFm location 
parameter) computed from the top k = 4 matches in the dictionary (bottom left image), the 
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proposed ML VMFm estimator fi computed with fc = 10 (bottom right image). The ML 
estimates k of the VMFm scale parameters were computed for each pixel. This k parameter 
is inversely proportional to the width of the estimated VMFm model over the 3 dimensional 


quaternion sphere. Figure 1^ shows images of these ML estimates translated into angular 
uncertainty (in degrees) by using the transformation = arccos(l — 1/k)^, which is 
the 1/e-width of the VMFm distribution in any fnndamental zone. Note that the OEM 


image in Fig. IT has many spnrious orientation estimates within grains nnlike the proposed 
dictionary based methods. Note also that the ML orientation estimates prodnce smoother 
in-grain orientations. The k = A and fc = 10 ML orientation estimates have low conhdence 
(high variance) at some locations on grain boundaries and in anomalous region at bottom 
right. This low conhdence is qnantihed by the ML estimator of the scale parameter k of the 


VMFm model, shown in Fig. 12 for k = A and k = 10. 


8 Conclusion and Future Work 

We have introduced a novel method for indexing polycrystalline materials that uses both 
mathematical-physics modeling and mathematical-statistics modeling. The physics-based 
forward model is discretized into a dense dictionary of diffraction patterns that are indexed 
by Euler angle triplets. The dictionary is hxed for each crystal symmetry gronp and each 
SEM instrument conhguration. The statistical model is based on the gronp symmetry of 
quaternion representation of the Euler angles on the 3-sphere in 4 dimensions. A feature 
of this method is that it performs classihcation, segmentation, and indexing in the nnified 
framework of dictionary matching. A featnre of the indexing method is that it incorporates 
a concentration parameter that can be estimated jointly with the Euler angles of a pixel or of 
a grain. This concentration parameter can be used to report the degree of confidence one can 
have in the Euler estimates. An iterative maximum likelihood estimator was proposed for 
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estimating the orientation and associated confidence parameters in a statistical Von Mises- 
Fisher mixture model. The method was illustrated on a single sectional slice of a Nickel 
alloy sample. 

As the proposed indexing method is pixel driven it is directly applicable to indexing over 3 
dimensional volumes. Future work will include algorithm acceleration to make full volumetric 
indexing fast enough to be practical. Potential acceleration methods include multi-resolution 
and multi-scale trees, fast coordinate ascent ML optimization, and parallelization. Other 
areas for future work include robustification of the dictionary to model mismatch, sensitivity 
to reductions in detector image resolution, and extensions of the dictionary approach to other 
electron diffraction modalities, such as electron channeling patterns and precession electron 
diffraction. Preliminary investigations indicate that, while dictionary-based classification 
appears to be robust to model mismatch, the proposed dictionary-based indexing algorithm 
is somewhat sensitive to model mismatch. This suggests that the dictionary design may 
need to be fine tuned to the SEM instrument in addition to the samples crystal symmetry 
group. The extension to other SEM modalities such as EDS is also possible but would require 
development of dictionaries that capture other types of data (e.g., spectra). 
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Figure 2: Decision tree for clustering detected patterns on the INIOO sample with examples of 
patterns in each cluster below the leaf nodes at bottom. Physical locations of these patterns 
on the sample are shown in Fig. The classiher uses the uncompensated pattern matches of 
a pixel to decide between shifted and noisy background at lower left. It uses the homogeneity 
of the compensated pattern matches over a 3 x 3 patch to decide between grain boundary 
and grain interior on the right. The number on each decision tree branch is the proportion 
of patterns at the parent node that were classihed with label of child node. 
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Figure 3: A random subset of the 333226 elements in the dictionary generated for the INIOO 
sample. Shown are 9 representative patterns, each 6080 pixels, in the uncompensated (Left) 
and compensated (Right) versions of the dictionary. 
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Figure 4: Left: The sampling pattern (at 1/8 density) of dictionary Rodrigues vectors in 
the fundamental zone (solid lines) of the cubic symmetry point group m3m. Right: Graph 
of the top 200 normalized inner products between the entire compensated INIOO dictionary 
and a randomly selected set of 4000 reference elements in the INIOO dictionary. For each of 
the reference elements the top 200 inner products have been rank ordered in decreasing order 
and plotted. A knee occurs in vicinity of A: = 40 for which the normalized inner product 
drops by at least 1/3 of the maximum value. 
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Figure 5: Illustration of a neighborhood similarity measure that quantihes the overlap be¬ 
tween fc-NN neighborhoods in an image patch. When the patch is inside a grain, the center 
of the patch will have a fc-NN neighborhood that overlaps with the fc-NN neighborhoods 
of the adjacent pixels. A patch that straddles a boundary will have the center pixel fc-NN 
neighborhood overlapping with the neighborhoods of a small number of other pixels. When 
the patch is centered at an anomalous pixel there is little or no overlap between the fc-NN 
neighborhoods of the center and adjacent pixels. 
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Figure 6: Raw SE and EBSD images of INIOO sample generated by the Tescan Vega SEM 
with native OEM software. Left: SE image of the INIOO sample showing physical locations of 
the four patterns shown at bottom of DT classiher in Fig. The inner-product histograms 
for the diffraction patterns at these locations are shown in Fig. Right: IFF colored EBSD 
pixel orientation image. 
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Figure 7: Histograms of the inner products between patterns in the dictionary and the 
patterns of the four EBSD scan locations (pixels) shown in Fig. The histograms for the 
shifted background and the noisy background are well separated from each other and from 
the histograms for the grain boundary and grain interior pixels in Fig. These latter two 
histograms are very concentrated near 1 and overlap each other (not distinguishable at this 
scale). 
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Figure 8: Left: histogram of normalized inner products between detected patterns on the 
sample and dictionary patterns restricted to the range p = [0.99, 0.999] to reveal the modes 
associated with grain interior and grain boundary patterns. Two other modes (not shown) are 
located near p = 0.7 and p = 0.97 corresponding to background shift and noisy background 
pixels, respectively (see Fig. [^. Right; histogram of neighborhood similarity measures 
between dictionary neighborhoods over a 3 x 3 patch centered at each pixel in the sample 
for neighborhood size k = 40. 
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Neighborhood similarity image 



Figure 9: Left: An image rendering of the (un-normalized) neighborhood similarity measure 
{k = 40 nearest neighbors in dictionary) used in the right branch of the DT classiher in 
Fig-i Right: A map of the pattern classes in the INIOO sample as determined by the 
DT classiher in Fig. The colors encode the four classes as follows: white=grain interior, 
black=grain boundary, red=noisy background, and blue=shifted background. Note that the 
black boundaries effectively segment the sample according to crystal orientation. 
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Figure 10: Blowup of a small region right of center in each of the images of Fig. 
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Figure 11: Comparison of orientation indexing. Top left: IFF images generated by OEM 
software. Top right: IPF image obtained by rendering the top matching patterns in the 
dictionary (this is identical to the ML estimator of the orientation using VMFm model 
with k = 1). Bottom left: Image of ML estimates of orientation using VMFm model on 
the orientations of the fc = 4 top dictionary matches. Bottom right: Same as bottom left 
except that k = 10. Note that the OEM image has many spurious orientation estimates 
within grains unlike the other dictionary based methods. Note also that the ML orientation 
estimates produce smoother in-grain orientations. The fc = 4 and fc = 10 ML orientation 
estimates have low confidence (high variance) at some locations on grain boundaries and in 
anomalous region at bottom right. This low confidence is quantified by the ML estimator of 
the scale parameter k of the VMFm model, shown in Fig. [T^for k = A and k = 10. 
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ML VMFm Indexing Uncertainty (k=4) 



ML VMFm Indexing Uncertainty (k=10) 



Figure 12: Images of the ML estimator of the orientation standard deviation (in degrees) 
obtained by ML estimation of the scale parameter k of the VMFm model corresponding to the 
bottom two sub-figures of Fig. The angular standard deviation ranges from 0.05 degrees 
to 0.5 degrees but those values above 0.25 have been hard-limited for ease of visualization 
(only 1% of all values are above 0.25 degrees). Note that the areas of least conhdence are in 
the vicinity of boundaries and anomalies. The highest standard deviations occur at pixels 
that straddle boundaries between grains having the highest misorientation. 
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